Skip to content

[Mellanox] Always restart thermalctld on Mellanox platform when it exits #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Junchao-Mellanox
Copy link
Owner

@Junchao-Mellanox Junchao-Mellanox commented Sep 15, 2020

- Why I did it

On mellanox paltform, part of thermalctld function is to handle user space thermal policies for events like fan/PSU removing, it works together with kernel thermal algorithm to make sure the switch won't be overheat.

Recently, we found that commit sonic-net@cbc75fe changes its autorestart configuration in supervisord, and it won't be auto restarted after being killed. This PR is to make sure that thermalctld will be always restarted on mellanox platform when it is killed.

- How I did it

  1. Add a variable "always_restart_thermalctld" in pmon_daemon_control.json
  2. In docker-pmon.supervisord.conf.j2, it checks variable "always_restart_thermalctld" and set autorestart configuration for thermalctld accordingly.

- How to verify it

Manual test

- Which release branch to backport (provide reason below if selected)

Depends on where sonic-net@cbc75fe is going to merge to.

  • 201811
  • 201911
  • 202006

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@Junchao-Mellanox Junchao-Mellanox changed the title Always restart thermalctld on Mellanox platform when it exits [Mellanox] Always restart thermalctld on Mellanox platform when it exits Sep 15, 2020
@Junchao-Mellanox
Copy link
Owner Author

sonic-net#5375

@Junchao-Mellanox Junchao-Mellanox deleted the restart_thermalctld_for_mlnx branch December 15, 2020 01:42
Junchao-Mellanox pushed a commit that referenced this pull request Jul 1, 2021
Advance submodule update with the following changes:
4475750 Config reload fix (#29)
cf60d5e [ci]: add proper azp (#26)
f0fbfe7 [CI] Set up CI with Azure Pipelines (#25)
879d7bd Include port default fec configuration to be included in ZTP configuration (#24)
a6ae955 Add a pre-defined plugin to download a list of files (#23)
6f0305b [MultiDB] Add multidb support to sonic-ztp (#16)
Junchao-Mellanox pushed a commit that referenced this pull request Mar 9, 2022
ce72b0d Longxiang Lyu Thu Feb 24 06:05:12 2022 Put handler member functions as virtual in base (#30)
ef59e4f Jing Zhang Fri Feb 25 11:38:28 2022 Incrementing tolerance on mux state inconsistency (#27)
2d12892 Longxiang Lyu Wed Feb 16 03:32:06 2022 Rename LinkManagerStateMachine to ActiveStandbyStateMachine (#26)
f38634c Jing Zhang Thu Feb 17 17:23:56 2022 Update log level for mux probing and mux state chance (#23)
a8434dd Jing Zhang Thu Feb 17 17:21:01 2022 Handle xcvrd crashing scenarios (#22)
2ebdb2b Longxiang Lyu Mon Feb 14 13:26:07 2022 [make] Enable make extra includes (#24)
Junchao-Mellanox pushed a commit that referenced this pull request Mar 14, 2022
Changes:

Update submodule branch to 202012
[sonic-linkmgrd][202012] submodule update

a8ddff5 Jing Zhang Fri Feb 25 11:38:28 2022 Incrementing tolerance on mux state inconsistency (#27)
a3f78a3 Jing Zhang Thu Feb 17 17:23:56 2022 Update log level for mux probing and mux state chance (#23)
05156fb Jing Zhang Thu Feb 17 17:21:01 2022 Handle xcvrd crashing scenarios (#22)
74529ef Longxiang Lyu Mon Feb 14 13:26:07 2022 [make] Enable make extra includes (#24)

sign-off: Jing Zhang [email protected]
Junchao-Mellanox pushed a commit that referenced this pull request Jan 8, 2025
…sonic-net#21269)

#### Why I did it
src/dhcpmon
```
* e003522 - (HEAD -> master, origin/master, origin/HEAD) [Build] Update to buijld bookworm debian package (#24) (21 hours ago) [Yaqiang Zhu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request Mar 7, 2025
… automatically (sonic-net#635)

#### Why I did it
src/sonic-platform-common
```
* d9de488 - (HEAD -> 202412, origin/202412) [code sync] Merge code from sonic-net/sonic-platform-common:202411 to 202412 (#28) (7 hours ago) [mssonicbld]
* 30112ca - [code sync] Merge code from sonic-net/sonic-platform-common:202411 to 202412 (#26) (31 hours ago) [mssonicbld]
* a36263b - [code sync] Merge code from sonic-net/sonic-platform-common:202411 to 202412 (#24) (2 days ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request Mar 7, 2025
…tically (sonic-net#723)

#### Why I did it
src/sonic-sairedis
```
* 5ee5610 - (HEAD -> 202412, origin/HEAD, origin/202412) [code sync] Merge code from sonic-net/sonic-sairedis:202411 to 202412 (#24) (21 hours ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request Mar 7, 2025
…tomatically (sonic-net#754)

#### Why I did it
src/sonic-linux-kernel
```
* 3f0c0de - (HEAD -> 202412, origin/HEAD, origin/202412) [code sync] Merge code from sonic-net/sonic-linux-kernel:202411 to 202412 (#24) (20 hours ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request Mar 7, 2025
…omatically (sonic-net#770)

#### Why I did it
src/sonic-swss-common
```
* 9a7a61a - (HEAD -> 202412, origin/HEAD, origin/202412) [FC] remove FLEX_COUNTER_DELAY_STATUS_FIELD (sonic-net#982) (#24) (21 hours ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request Jun 6, 2025
…UT so that we can get back to back Paladin ports up with Arista-7060X6-16PE-384C-O128S2 (sonic-net#1144)

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "fixes #xxxx", or
 "closes #xxxx" or "resolves #xxxx"

 Please provide the following information:
-->

#### Why I did it

Currently when we loaded HWSKU `Arista-7060X6-16PE-384C-O128S2` on two moby devices and connect their Paladin ports back to back, we can't get link up. It may help if we can get these links up and run the tests.

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it

Created a new `FANOUT` HWSKU containing special lanemap and polarity configs so that we can load `Arista-7060X6-16PE-384C-O128S2` on one Moby and `Arista-7060X6-16PE-384C-O128S2-FANOUT` and get Paladin ports up when connecting them back to back with the following setup:
```
Moby1 Moby2
HWSKU: Arista-7060X6-16PE-384C-O128S2 HWSKU: Arista-7060X6-16PE-384C-O128S2-FANOUT
#17 <-> #18
#19 <-> #20
#21 <-> #22
#23 <-> #24

#18 <-> #17
#20 <-> #19
#22 <-> #21
#24 <-> #23
```

#### How to verify it
Verified that all the Paladin ports can link up with the above setup.

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
- [ ] 202211
- [ ] 202305
- [x] msft-202412

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->
- [x] msft-202412

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->
Created `Arista-7060X6-16PE-384C-O128S2-FANOUT` based on `Arista-7060X6-16PE-384C-O128S2` and only update lanemap and polarity settings in bcm config.

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants